Bioinformatics (Thomas Dandekar, Meik Kunz)

174

contrast, methods that first find out whether the protein sequence is similar enough to a

known structure and then predict the 3-D structure after “copying” it are surprisingly pow

erful due to the sheer size of the data (tens of thousands of known protein structures with

their x-y-z coordinates in the protein database).

Then, when the protein is completed to that point, its stability is determined by different

amino acid codons at the 3′-terminus. For example, there are specific instability sequences

at the C-terminus of the protein that determine its stability.

In general, it can be said that bioinformatics for deciphering these codes practically

always starts with the sequence, but then uses other features, especially the structure, but

for RNA, for example, the energy. For proteins, protein structure prediction is still compu

tationally time intensive and very difficult for new folding types. Also, the decoding of

transcription, DNA control sequences, and even new types of RNA (e.g., for lncRNAs

(long non-coding RNAs) and miRNAs (microRNAs), one must correctly predict their tar

gets) are only partially understood. On the other hand, increasingly complete large datas

ets of total transcription from a wide variety of cell types are available, gradually

supplemented by proteome datasets and metabolite data.

13.2

New Molecular, Cellular and Intercellular Levels and Types

of Language Are Emerging All the Time

The exciting thing, however, is that these types of languages are only the beginning. For

example, at the molecular level there is also a sugar code (glycosylations and these sugar

residue-binding proteins, so-called lectins), which regulates, among other things, which

cells come together to form tissue associations and, for example, are simply ignored by the

affected cancer cells when metastases form. There are also other codons for cell-cell com

munication (lipids, desmosomes and so on), until we finally arrive at one of the most com

plex systems of all, the immune system, which in each of us performs the task of reliably

distinguishing between self and foreign. There is already a great deal of data on the immune

system, for example on the white blood cells, where we can distinguish between lympho

cytes (antibody-producing B cells and directly defending T cells; the latter are subdivided

into helper cells, native killer cells and CD8 T cells and then into ever new subtypes), and

on other defence cells, in particular monocytes, dendritic cells and macrophages. But that’s

the beginning. The immunologist and immunologist distinguish very fine subtypes depend

ing on the surface receptors that white blood cells have and their specific subfunction. In

addition, there are platelets that also support the immune response. We study these cell

types intensively and find that for each of these defense cells, again, you can make a sepa

rate systems biology model. The language diversity and complex coding of the various

immune responses are only surpassed in complexity by our nervous system. Both systems

have only been deciphered in their various codes and language levels in rough outline. So

there are still many open questions and exciting secrets that still want to be deciphered.

In evolutionary terms, the different levels of the languages of life can be simplified as

shown in the box: Starting from preforms of life (about 3.3 billion years ago), as is still the

13 Life Invents Ever New Levels of Language